This is a script for the (hopefully) final analysis for publication Script is organized according to Figures.

prep data

Fig S1&2

SM Fig S1A: overall acc

Table S1: Acc per experiment across all subjects
exp_name mean lo_ci hi_ci t df p.val ES
Exp. 1 60.8 58.7 63.0 10.2 31 0 10.1
Exp. 2 59.2 57.4 61.0 10.4 39 0 10.6
Exp. 3 58.9 57.5 60.4 12.3 38 0 13.0
combined 59.6 58.6 60.6 18.8 110 0 11.1

SM Fig 1B. Behav benchmarks of learning- acc on Cue valid>invalid

Table S2 Diff Acc Cue valid vs Cue Invalid per experiment across all subjects
exp_name mean_diff lo_ci hi_ci t df p.val ES
Exp. 1 44.0 37.0 50.9 12.9 31 0 2.3
Exp. 2 46.4 40.1 52.7 14.8 39 0 2.3
Exp. 3 43.3 36.6 50.0 13.1 38 0 2.1
combined 44.6 40.9 48.3 23.7 110 0 2.3

SM Figure S2 (Left): learning curve of acc- all Subjects

Note! This is graph with all of the subjects

SM Figure 2 (Left): learning curve with only pre-reg subs & trials

Rerun the same graph but only with those that are pre-registered

Fig 1

Fig 1 C Hypothetical results

Fig 2

Figure 2A right: Gaze follows explicit response-ggpaired

Diff gaze direction by prediction per experiment (pre-reg)
exp_name mean_diff lo_ci hi_ci t df p.val ES
Exp. 1 0.95 0.80 1.10 12.79 24 0 2.56
Exp. 2 0.90 0.79 1.01 16.93 30 0 3.04

Figure 2b left: Gaze follows explicit response-timecourse

Now let’s try to combine timecourse and ggpaired using cowplot.

Fig 2D Same/Diff Gaze & Explicit response

Here we are looking at Same/Diff of the eye as binary response

Convergence/divergence of Eye & Response
exp_name eye_resp_match mean_porp lo_ci hi_ci
Exp. 1 diff 26.03 21.96 30.09
Exp. 1 same 73.97 69.91 78.04
Exp. 2 diff 28.36 25.20 31.52
Exp. 2 same 71.64 68.48 74.80
Exp.1&2 diff 27.32 24.86 29.77
Exp.1&2 same 72.68 70.23 75.14

We see that around 70% explicit & eye responses are the same, but there is also considerable between subject variance

In a way this is similar to our finding that Gaze Prediction values are robustly at subject-level positive.

Fig 2D: sRW trajectory and Gaze Prediction

First we’ll do the pre-reg analysis (individual Ss correlation) Here we are using sticky RW model. See below for use of pre-registered RW model

print stats of correlation

correlation EM and learning metric RWS
exp_name vars_compared mean_corr lo_corr hi_corr group_t_statistic group_pval ES n_sig total_sub
Exp. 1 PC_pred_gz 0.131 0.188 0.075 4.797 0 0.959 11 25
Exp. 2 PC_pred_gz 0.106 0.160 0.051 3.962 0 0.712 11 31

Now let’s make the figures

RWS corr Gaze Prediction and P. choice using mixed models
term npar AIC BIC logLik deviance statistic df p.value
pc_bygz_pred_null_mdl.res 3 -1109.195 -1086.663 557.597 -1115.195 NA NA NA
pc_bygz_pred_full_mdl.res 4 -1251.601 -1221.559 629.801 -1259.601 144.406 1 0

Fig 2C- Plotting trajectory of individual Ss

SM Figure 3 & S4

Fig S3: Individual Subject Gaze by Prediction

Now let’s do the stats

Gaze by Prediction, single ss signfiacne
exp_name n_sig total_n
Exp. 1 23 25
Exp. 2 31 31
Exp. 3 30 32

F#$%ing amazing

SM Fig 4 convergence and dvergence per experiment

SM Fig S6 P. Choice from RW & corr to Gaze Prediction

print stats of pre-reg analysis

correlation EM and learning metric RW (PreReged)
exp_name vars_compared mean_corr lo_corr hi_corr group_t_statistic group_pval ES n_sig total_sub
Exp. 1 PC_pred_gz 0.154 0.214 0.095 5.349 0.000 1.070 14 25
Exp. 1 da_pred_gz -0.099 -0.048 -0.150 -4.042 0.000 -0.808 13 25
Exp. 1 distv5_pred_gz 0.091 0.142 0.041 3.724 0.001 0.745 10 25
Exp. 2 PC_pred_gz 0.133 0.187 0.080 5.064 0.000 0.910 15 31
Exp. 2 da_pred_gz -0.107 -0.061 -0.153 -4.782 0.000 -0.859 9 31
Exp. 2 distv5_pred_gz 0.086 0.127 0.045 4.263 0.000 0.766 10 31
Exp. 3 PC_pred_gz 0.065 0.097 0.032 4.088 0.000 0.723 5 32
Exp. 3 da_pred_gz -0.043 -0.013 -0.073 -2.900 0.007 -0.513 3 32
Exp. 3 distv5_pred_gz 0.045 0.071 0.019 3.539 0.001 0.626 2 32
RW
term npar AIC BIC logLik deviance statistic df p.value
pc_bygz_pred_null_mdl.res_rw 3 -1363.147 -1340.615 684.574 -1369.147 NA NA NA
pc_bygz_pred_full_mdl.res_rw 4 -1543.632 -1513.590 775.816 -1551.632 182.485 1 0

So lme gives simialr signfiacnt results using RW model.

Figure 3

Examining confidence benchmarks

Panel A: Benchamrk 1 montonic rise in acc

Conf benchmark 1 distribution of spearman correlation
exp_name mean_corr lo_corr hi_corr group_t_statistic group_pval ES
Exp. 1 0.3327 0.5266 0.1388 3.5416 0.0017 0.7083
Exp. 2 0.2823 0.4516 0.1130 3.4060 0.0019 0.6117
Exp. 3 0.1767 0.3595 -0.0062 1.9707 0.0577 0.3484

So these results aren’t super strong. But this is actually a pretty poor way of testing this hypothesis.

Let’s try now using a logistic regression approach

Conf benchmark 1 with glmer
effect group term estimate std.error statistic p.value exp_name
fixed NA (Intercept) 1.04042 0.06487 16.03931 0 Exp. 1
fixed NA m_pred_gz 0.26507 0.04654 5.69518 0 Exp. 1
fixed NA (Intercept) 1.06429 0.07339 14.50270 0 Exp. 2
fixed NA m_pred_gz 0.26613 0.04153 6.40856 0 Exp. 2

So this gives a really robust result that is also statistically much more appropriate.

Visualize Panel A: benchmark 1 (Right= Exp. 1 Left Exp. 2)

Panel B: Benchmark 2 - folded X pattern

We are binning by expectation strength & rule accuracy

Here again we can do either with RW or RWS

Running with RWS

RWS lme results Benchmark 2
term npar AIC BIC logLik deviance statistic df p.value exp
bench2_full_exp1.res_rws 6 9589.991 9627.512 -4788.995 9577.991 9.3206 1 0.0023 Exp. 1
bench2_full_exp2.res_rws 6 12097.353 12136.208 -6042.677 12085.353 20.6676 1 0.0000 Exp. 2

SM table B2 pairise comparison of Benchmark 2 RWS
exp_name dist_v_rule_acc_bin lo_ci_0 lo_ci_1 hi_ci_0 hi_ci_1 m_gz2pred_0 m_gz2pred_1 p_val t df
Exp. 1 1 0.6197 0.4296 0.1719 0.2147 0.3958 0.3222 0.3643 0.9272 21
Exp. 1 2 0.3668 0.5646 0.0047 0.3723 0.1858 0.4685 0.0203 -2.5109 21
Exp. 1 3 0.4852 0.6189 0.0877 0.4004 0.2864 0.5096 0.0536 -2.0509 20
Exp. 1 4 0.6779 0.6753 0.2502 0.4599 0.4641 0.5676 0.5779 -0.5657 20
Exp. 1 5 0.3692 0.7310 -0.0447 0.5353 0.1622 0.6332 0.0003 -4.3777 20
Exp. 1 6 0.4522 0.6906 0.0769 0.4552 0.2645 0.5729 0.0017 -3.6473 19
Exp. 2 1 0.4922 0.4171 0.1195 0.2187 0.3059 0.3179 0.7444 -0.3296 25
Exp. 2 2 0.5976 0.5772 0.2794 0.3680 0.4385 0.4726 0.7044 -0.3838 25
Exp. 2 3 0.4228 0.5857 0.1608 0.4150 0.2918 0.5004 0.0036 -3.2260 24
Exp. 2 4 0.3923 0.5900 0.1055 0.4389 0.2489 0.5144 0.0024 -3.3856 24
Exp. 2 5 0.4085 0.6051 0.0874 0.4425 0.2480 0.5238 0.0012 -3.6888 23
Exp. 2 6 0.4586 0.7117 0.0479 0.5172 0.2532 0.6144 0.0041 -3.1869 23
Exp. 3 1 0.5340 0.5032 0.2199 0.2778 0.3770 0.3905 0.8817 -0.1501 30
Exp. 3 2 0.3810 0.5500 0.1437 0.3749 0.2623 0.4624 0.0077 -2.8643 29
Exp. 3 3 0.5129 0.5076 0.2304 0.3198 0.3716 0.4137 0.7883 -0.2711 28
Exp. 3 4 0.4813 0.5710 0.1807 0.4076 0.3310 0.4893 0.0614 -1.9490 28
Exp. 3 5 0.5113 0.6654 0.1951 0.5070 0.3532 0.5862 0.0018 -3.4842 26
Exp. 3 6 0.7201 0.6172 0.3931 0.3989 0.5566 0.5081 0.4961 0.6911 24

Fig 3C: Benchmark 3 - steeper curve for high Gaze prediction trials

RWS

Benchmark 3 RWS (glmer)
effect group term estimate std.error statistic p.value exp_name
fixed NA (Intercept) 0.4268423 0.2594215 1.6453625 0.0998951 Exp. 1
fixed NA pred_gz_bin -0.2304368 0.1659172 -1.3888657 0.1648736 Exp. 1
fixed NA dist_v_bin 0.1027834 0.0735716 1.3970534 0.1623975 Exp. 1
fixed NA pred_gz_bin:dist_v_bin 0.1531692 0.0484909 3.1587214 0.0015846 Exp. 1
fixed NA (Intercept) 0.3611247 0.2375613 1.5201329 0.1284776 Exp. 2
fixed NA pred_gz_bin -0.1079814 0.1492352 -0.7235651 0.4693328 Exp. 2
fixed NA dist_v_bin 0.0677893 0.0652881 1.0383099 0.2991258 Exp. 2
fixed NA pred_gz_bin:dist_v_bin 0.1554912 0.0431693 3.6018969 0.0003159 Exp. 2

Fig S7

Exp 1 & 2 confidence benchmarks using pre-reg’ed RW

RW lme results Benchmark 2
term npar AIC BIC logLik deviance statistic df p.value exp
bench2_full_exp1.res_rw 6 9595.689 9633.21 -4791.844 9583.689 11.737 1 6e-04 Exp. 1
bench2_full_exp2.res_rw 6 12086.919 12125.77 -6037.460 12074.919 29.625 1 0e+00 Exp. 2

Visualize Benchmark 2

SM table B2 pairise comparison of Benchmark 2 RW
exp_name dist_v_rule_acc_bin lo_ci_0 lo_ci_1 hi_ci_0 hi_ci_1 m_gz2pred_0 m_gz2pred_1 p_val t df
Exp. 1 1 0.4639 0.4240 0.1268 0.2722 0.2953 0.3481 0.6029 -0.5282 21
Exp. 1 2 0.5229 0.5602 0.0686 0.3715 0.2958 0.4658 0.2710 -1.1306 21
Exp. 1 3 0.6008 0.5935 0.2224 0.3889 0.4116 0.4912 0.6721 -0.4296 20
Exp. 1 4 0.6085 0.7024 0.0990 0.4840 0.3537 0.5932 0.1040 -1.7032 20
Exp. 1 5 0.3138 0.7125 -0.1024 0.5203 0.1057 0.6164 0.0006 -4.0432 20
Exp. 1 6 0.4358 0.6800 0.0900 0.4347 0.2629 0.5574 0.0022 -3.5383 19
Exp. 2 1 0.5053 0.4405 0.1972 0.2741 0.3512 0.3573 0.8825 -0.1494 25
Exp. 2 2 0.5563 0.5297 0.2410 0.3441 0.3987 0.4369 0.5895 -0.5467 25
Exp. 2 3 0.5025 0.5171 0.2609 0.3646 0.3817 0.4408 0.1374 -1.5367 24
Exp. 2 4 0.4258 0.5757 0.0687 0.4071 0.2473 0.4914 0.0165 -2.5784 24
Exp. 2 5 0.4448 0.6902 0.0721 0.5343 0.2584 0.6123 0.0008 -3.8352 23
Exp. 2 6 0.4000 0.6955 0.0206 0.5148 0.2103 0.6052 0.0021 -3.4659 23
Exp. 3 1 0.5372 0.5230 0.2961 0.3385 0.4167 0.4308 0.8461 -0.1958 30
Exp. 3 2 0.4343 0.5316 0.1244 0.3322 0.2793 0.4319 0.0501 -2.0441 29
Exp. 3 3 0.5356 0.5587 0.2212 0.3587 0.3784 0.4587 0.3228 -1.0065 28
Exp. 3 4 0.4498 0.5898 0.1374 0.3825 0.2936 0.4862 0.0347 -2.2194 28
Exp. 3 5 0.5937 0.6257 0.2290 0.4684 0.4114 0.5471 0.0577 -1.9856 26
Exp. 3 6 0.6315 0.5867 0.2894 0.3983 0.4605 0.4925 0.7537 -0.3174 24

Table S3 Benchmark 2 pairwise comparison

Table S3, Pairwise comparison Benchmark 2
expectation_bin statistic df p p.adj model
1 0.3249 53 0.7470 0.7470 sRW
2 -2.5301 53 0.0140 0.0140 sRW
3 -3.2031 53 0.0020 0.0020 sRW
4 -3.0650 53 0.0030 0.0030 sRW
5 -5.8861 53 0.0000 0.0000 sRW
6 -3.8821 53 0.0003 0.0003 sRW
1 -0.6510 53 0.5180 0.5180 RW
2 -1.1333 53 0.2620 0.2620 RW
3 -1.8426 53 0.0710 0.0710 RW
4 -3.5504 53 0.0008 0.0008 RW
5 -6.6255 53 0.0000 0.0000 RW
6 -4.2851 53 0.0001 0.0001 RW

RW

Fig S7 Right

Benchmark 3 (glmer)
effect group term estimate std.error statistic p.value exp_name
fixed NA (Intercept) 0.0553951 0.2609975 0.2122439 0.8319167 Exp. 1
fixed NA pred_gz_bin -0.0926035 0.1660699 -0.5576175 0.5771056 Exp. 1
fixed NA dist_v_bin 0.2221092 0.0757746 2.9311819 0.0033768 Exp. 1
fixed NA pred_gz_bin:dist_v_bin 0.1109852 0.0495981 2.2376908 0.0252412 Exp. 1
fixed NA (Intercept) 0.3254890 0.2381790 1.3665733 0.1717591 Exp. 2
fixed NA pred_gz_bin -0.1724367 0.1493580 -1.1545190 0.2482875 Exp. 2
fixed NA dist_v_bin 0.0763765 0.0663759 1.1506672 0.2498692 Exp. 2
fixed NA pred_gz_bin:dist_v_bin 0.1793884 0.0440190 4.0752545 0.0000460 Exp. 2

Table S4 Benchmark 3 pairwise comparison

Table S3, Pairwise comparison Benchmark 2
expectation_bin statistic df p p.adj model
1 1.307 53 0.197 0.197 RWS
2 -2.185 53 0.033 0.033 RWS
3 -3.752 53 0.000 0.000 RWS
4 -2.997 53 0.004 0.004 RWS
5 -3.555 53 0.001 0.001 RWS
6 -3.643 53 0.001 0.001 RWS
1 0.819 53 0.417 0.417 RW
2 -1.592 53 0.117 0.117 RW
3 -3.747 53 0.000 0.000 RW
4 -4.979 53 0.000 0.000 RW
5 -3.367 53 0.001 0.001 RW
6 -2.354 53 0.022 0.022 RW

Fig 4

First let’s examine the relation between explicit cnfidence and P. Choice

Correlation of Confidence and RWS trial-by-trial trajectories

correlation Confidence Ratings and sRW trajectories
vars_compared mean_corr hi_CI lo_CI group_t_statistic group_pval ES n_sig total_sub
conf_PC 0.384 0.438 0.330 14.468 0 2.558 30 32
conf_prev_delta -0.266 -0.195 -0.337 -7.612 0 -1.346 22 32

So there is a super robust correlation between P. Choice and confidence ratings 30/32 Ss are significant!

Figure 4A

Correlation of confidence & Gaze

Experiment 3 correlation Confidence Ratings and EM
vars_compared mean_corr lo_corr hi_corr group_t_statistic group_pval ES n_sig total_sub
conf_gz2pred 0.0756 0.038 0.1132 4.1023 3e-04 0.7252 8 32

Mixed model examining relation between Gaze, confidence & other factors

So confidence & gaze are correlated but we also know that there are both correlated with a bunch of other factors (most notably P. Choice, accuracy). So using mixed models lets try to tease apart these factors

MM results of gaze by confidence & other variables
term npar AIC BIC logLik deviance statistic df p.value
gz_conf_mdl2_full.res 7 11892.77 11938.18 -5939.387 11878.77 9.5152 1 0.002

This is the formula used: m_pred_gz ~ z_conf_rating + expl.RWS.p_choice + resp_rule_acc + expl.RWS.prev_delta + (1 | sub_name)

Fig 4 B

Comparing metacognition of eye vs. explicit ratings confidence.

From the OSF (Pre-reg Hypothesis 2): “We will compare the metacognitive sensitivity of explicit confidence ratings and Gaze2Prediction. It should be noted that in this experimental paradigm both measures are obtained for each trial, ensuring that first-order performance is identical and controlled for. Accordingly, we use delta confidence, the difference of the mean confidence rating for correct vs. incorrect trials (accuracy defined as rule accuracy) as a straight-forward index of metacognition well suited to deal with continuous confidence ratings (Rahnev, D., 2023). To ensure that differences between the measures does not arise from differences in the measurement scale we will use a transformation of Gaze2Prediction into eye “confidence rating” (see “Indices”). To compare the two measures we will perform a paired t-test on the delta confidence for the two measures. Bayesian analyses will be used to examine a potential null finding regarding the difference between the measures with a Bayes Factor <.33 considered as moderate evidence (Wagenmakers et al., 2018).”

So we see a big win for Explicit Rating! They have much higher metacognition

ttest results of delta confidence expl vs. eye
estimate conf.low conf.high statistic p.value parameter effsize
0.481 0.353 0.608 7.685 0 31 1.358

stats of each measures metacognitive sensitivity

i.e. delta confidence >0

explicit/eye confidence one sample ttest
source mean lo_ci hi_ci statistic p_val ES
expl 0.583 0.469 0.697 10.437 0.000 1.845
eye 0.103 0.034 0.171 3.066 0.004 0.542

Fig 4C: Compare Acc by source

Now we can look at how accurate each source is

Look at overall acc by source

overall acc by source per exp
exp_name source mean_acc acc_lo_ci acc_hi_ci
Exp. 1 eye 65.198 62.209 68.186
Exp. 1 resp 75.668 73.168 78.168
Exp. 2 eye 63.919 61.998 65.840
Exp. 2 resp 75.740 73.099 78.381
Exp 1& 2 eye 64.490 62.834 66.146
Exp 1& 2 resp 75.708 73.926 77.490
stats of acc by source (exp 1& 2 comb)
p.val t df ES
0 -13.127 55 -1.754
diff acc by source (exp 1&2)
mean_diff diff_lo_ci diff_acc_hi_ci
11.634 9.962 13.305
one sample ttest vs. chance by source
source p.val t df ES
eye 0 17.534 55 2.343
eye 0 17.534 55 3.863
resp 0 28.909 55 2.343
resp 0 28.909 55 3.863

So Eye accuracy is quite good ~60%!!

Figure 4D: Learning curve by source

Fig S8

Examine the correlation between Explicit and Ocular metacognitionacross SS

stats of correlation between explicit and ocular metacognition
r pval df BF
0.09 0.63 31 0.43

This is interesting! Eye metacognition & Explict Metacognition seem to be somewhat unrelated abilities.

BF is inconclusive

Need to check this with Bayesian null analysis and sample size is still small so prbly think of a way to show at a trial level that the two diverge

further test for correlation between metacognition across SS

So let’s try also controling for each Ss’s rule acc

expl. meta by eye meta: rule acc
term estimate std.error statistic p.value
(Intercept) -0.930 0.792 -1.175 0.250
diff_rating_eye 1.483 3.928 0.378 0.709
m_acc 2.029 1.062 1.912 0.066
diff_rating_eye:m_acc -1.808 5.117 -0.353 0.727

Fig 5

Fig 5A: First we get overall rate of switching

Overall Rule switching by source
exp_name source mean_porp lo_ci hi_ci
Exp. 1 eye 38.53 35.20 41.85
Exp. 1 response 20.25 16.99 23.52
Exp. 2 eye 40.87 38.27 43.47
Exp. 2 response 21.07 17.31 24.83
Exp. 3 eye 42.06 39.60 44.51
Exp. 3 response 22.22 19.54 24.90
Exp.1 &2 eye 39.82 37.80 41.85
Exp.1 &2 response 20.71 18.24 23.17
Stats of Overall Rule switching by source
p.val t df ES
0 16.34 55 2.183

So eyes are signfiacntly more Switch-y!!

Figure 5b - Winning model for Eye vs. Response

rmANOV results of BIC model per source
Effect DFn DFd F p p<.05 pes source
model 1.24 68.02 112.212 0e+00 * 0.671 expl
model 1.43 78.59 9.657 8e-04 * 0.149 Eye
pairwise comparison of model’s BIC per source
source .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
expl BIC RW RWS 56 56 3.6220 55 0.0006 0.0020 **
expl BIC RW WSLS 56 56 -11.0210 55 0.0000 0.0000 ****
expl BIC RWS WSLS 56 56 -10.9932 55 0.0000 0.0000 ****
eye BIC RW RWS 56 56 -4.6636 55 0.0000 0.0001 ****
eye BIC RW WSLS 56 56 -4.1242 55 0.0001 0.0004 ***
eye BIC RWS WSLS 56 56 -1.0241 55 0.3100 0.9300 ns
summary stats of BIc per model’s & source
source model variable n mean ci
expl RW BIC 56 143.728 10.217
expl RWS BIC 56 136.095 11.070
expl WSLS BIC 56 194.494 4.482
eye RW BIC 56 199.120 6.094
eye RWS BIC 56 205.094 6.753
eye WSLS BIC 56 207.582 3.243

small detour to viz model comparison metrics

SM fig S5- barplot with dots of values- faceted

Figure 5c- Comparison of sRW model parameters by source

This looks pretty good

Now lets

get stats of parameter comparison by source without excluding any outliers

summary stats of model by source
model param source mean lo_ci hi_ onesample_t onesample_pval ES
RW alpha expl 0.2584 0.2927 0.2241 15.1028 0.0000 2.0182
RW alpha eye 0.3699 0.4454 0.2944 9.8158 0.0000 1.3117
RW beta expl 4.3633 4.9942 3.7323 13.8584 0.0000 1.8519
RW beta eye 2.7004 3.9514 1.4493 4.3257 0.0001 0.5781
RWS alpha expl 0.3124 0.3641 0.2607 12.1044 0.0000 1.6175
RWS alpha eye 0.3293 0.4112 0.2474 8.0574 0.0000 1.0767
RWS beta expl 12.4515 21.2807 3.6223 2.8262 0.0066 0.3777
RWS beta eye 9.1133 14.0180 4.2086 3.7237 0.0005 0.4976
RWS rho expl 1.4259 2.2659 0.5859 3.4019 0.0013 0.4546
RWS rho eye 0.2108 0.2743 0.1472 6.6487 0.0000 0.8885
WSLS eps expl 0.5816 0.6120 0.5512 38.3080 0.0000 5.1191
WSLS eps eye 0.6885 0.7195 0.6575 44.4917 0.0000 5.9455
pairwise comparison of model param by source
model param t_2sample p_2sample df is_sig
RW alpha -3.6034 0.0007 55 1
RW beta 2.4893 0.0159 55 1
RWS alpha -0.4167 0.6785 55 0
RWS beta 0.6510 0.5178 55 0
RWS rho 2.9029 0.0053 55 1
WSLS eps -6.3743 0.0000 55 1
BF Stats of RWS alpha by source (Exp 1&2)
Model log_BF BF10 BF01
Null, mu=0 0.000 1.000 1.000
Alt., r=0.707 -1.842 0.159 6.309
BF Stats of RWS BETA by source (Exp 1&2)
Model log_BF BF10 BF01
Null, mu=0 0.000 1.000 1.000
Alt., r=0.707 -1.723 0.179 5.602

Fig. 5D: divergence rate of eye & explicit by P.choice

let’s look at exploratory vs. exploitation

summary stats mean diverge by explor/exploit
explor_exploit variable n mean ci lo_ci hi_ci
exploit porp 56 25.560 2.352 23.208 27.912
explor porp 54 35.471 4.740 30.731 40.211
paired ttest of diverge by explor/exploit
statistic p.value parameter ES
-4.1347 1e-04 53 -0.5627

This is really neat

SM Fig S9-S13

Here we are ruling out alternative explanations for ocular enhanced switchiness

Plot Fig S9A: Motor Switching as opposed to rule switching

Till now we looked at rule switching, but perhpas the eyes are more/less switch-y R/L (i.e if t-1 =R, t also R regardless of cue) and this results in more rule switchs.

Overall Motor switching by source
exp_name source mean_porp lo_ci hi_ci
Exp. 1 eye 0.46 0.43 0.49
Exp. 1 resp 0.47 0.43 0.50
Exp. 2 eye 0.47 0.45 0.49
Exp. 2 resp 0.47 0.44 0.50
Exp. 3 eye 0.47 0.45 0.49
Exp. 3 resp 0.54 0.52 0.55
Exp.1 &2 eye 0.47 0.45 0.48
Exp.1 &2 resp 0.47 0.45 0.49
Stats of Overall Rule switching by source (Exp 1&2)
p.val t df ES
0.854 -0.185 55 -0.025
BF Stats of Overall Rule switching by source (Exp 1&2)
Model log_BF BF10 BF01
Null, mu=0 0.000 1.000 1.000
Alt., r=0.707 -1.908 0.148 6.743

So no difference in general switching of response. This is pretty suprising! So the Eye’s enhanced switch-ness cant be explained by “simple” perseverance (i.e. if previous trial was “R” next trial will also be “R”)

Fig. S9B: Effect of Rule switching and previous trial’s ACC

Here we are looking at each source contained within itself So for eye (P.Switch Eye| prev. trail’s Eye response acc) vs. (P.Switch Resp| prev. trail’s Resp response acc)

## ANOVA Table (type III tests)
## 
##            Effect DFn DFd       F        p p<.05    pes
## 1        prev_acc   1  55 451.598 3.43e-28     * 0.8910
## 2          source   1  55 210.793 1.83e-20     * 0.7930
## 3 prev_acc:source   1  55   0.011 9.17e-01       0.0002
Summary Stats of P. Rule switching by source & prev trial acc
exp_name source prev_acc mean_pswitch lo_ci hi_ci
Exp. 1 Explicit 0 38.890 32.195 45.586
Exp. 1 Explicit 1 8.870 6.432 11.308
Exp. 1 Gaze 0 54.818 51.529 58.107
Exp. 1 Gaze 1 26.212 21.750 30.673
Exp. 2 Explicit 0 36.976 31.450 42.502
Exp. 2 Explicit 1 10.592 7.515 13.670
Exp. 2 Gaze 0 56.211 53.070 59.351
Exp. 2 Gaze 1 28.308 25.143 31.473
Exp. 3 Explicit 0 43.720 37.584 49.856
Exp. 3 Explicit 1 8.281 5.906 10.656
Exp. 3 Gaze 0 54.725 51.611 57.838
Exp. 3 Gaze 1 32.504 28.309 36.699
Exp 1&2 Explicit 0 37.831 33.686 41.975
Exp 1&2 Explicit 1 9.823 7.849 11.798
Exp 1&2 Gaze 0 55.589 53.378 57.800
Exp 1&2 Gaze 1 27.372 24.794 29.950
RM-ANOVA of Rule switching by source & prev trial acc (Exp. 1&2
Effect DFn DFd F p p<.05 pes
prev_acc 1 55 451.598 0.000 * 0.891
source 1 55 210.793 0.000 * 0.793
prev_acc:source 1 55 0.011 0.917 0.000
Bayesian rmANOVA for rule switching by prev_acc& rule switching
Model log_BF
source + prev_acc + source:prev_acc + sub_name 0.000
source + prev_acc + sub_name 1.632
source + source:prev_acc + sub_name -126.940
prev_acc + source:prev_acc + sub_name -68.879

So it doesn’t seem that there is a signfiacnt interaction between source & previous trial’s acc So Eyes are not more switch-y following errors relative to Explicit Response

Let’s try plotting it

So there is no interaction It isn’t that eyes switch more after an error as opposed to explicit responses

diving into beta differences & Fig S10

because of the skewness of beta values we test in a number of ways: 1. regular t-test 2. t-test of log values 3. non-parameteric wilcox 4. bootstrapping of difference 5. t-test of beta wheen fitted with upper boundary at 15

diff statistacl tests of beta by source
statistic p.value method
0.651 0.5178 reg. ttest
1.8084 0.0760 log-transform ttest
493 0.0130 Wilcoxon signed rank test with continuity correction
0.1398 0.8893 ttest winsorize beta (20)
-4.5,16.335 0.5562 Bootstrap (bca)

SM Figure S10 beta comparison unfiltered

Fig S11

So let’s look per model & parameter

model parameters correlation by source
model param var1 var2 cor statistic p conf.low conf.high method
RW alpha expl eye 0.590 5.317 0.000 0.382 0.736 Pearson
RW beta expl eye 0.110 0.801 0.427 -0.159 0.361 Pearson
RW beta_wins expl eye 0.170 1.287 0.204 -0.095 0.416 Pearson
RW log_beta expl eye 0.360 2.780 0.008 0.101 0.568 Pearson
RWS alpha expl eye 0.330 2.573 0.013 0.074 0.546 Pearson
RWS beta expl eye -0.042 -0.306 0.761 -0.301 0.224 Pearson
RWS beta_wins expl eye -0.029 -0.212 0.833 -0.290 0.236 Pearson
RWS log_beta expl eye 0.060 0.440 0.662 -0.208 0.320 Pearson
RWS rho expl eye 0.055 0.407 0.686 -0.211 0.314 Pearson
WSLS eps expl eye 0.400 3.221 0.002 0.155 0.601 Pearson
model parameters correlation by source (RWS only)
model param var1 var2 cor statistic p conf.low conf.high method
RWS alpha expl eye 0.330 2.573 0.013 0.074 0.546 Pearson
RWS beta expl eye -0.042 -0.306 0.761 -0.301 0.224 Pearson
RWS beta_wins expl eye -0.029 -0.212 0.833 -0.290 0.236 Pearson
RWS log_beta expl eye 0.060 0.440 0.662 -0.208 0.320 Pearson
RWS rho expl eye 0.055 0.407 0.686 -0.211 0.314 Pearson
BF01 for null corr of RWS model parameters by source
param BF01
rho 3.072
beta 3.173
log beta 3.009
Winsorized beta 3.244
model parameters correlation by source
model param var1 var2 cor statistic p conf.low conf.high method
RW alpha expl eye 0.590 5.317 0.000 0.382 0.736 Pearson
RW beta expl eye 0.110 0.801 0.427 -0.159 0.361 Pearson
RW beta_wins expl eye 0.170 1.287 0.204 -0.095 0.416 Pearson
RW log_beta expl eye 0.360 2.780 0.008 0.101 0.568 Pearson
RWS alpha expl eye 0.330 2.573 0.013 0.074 0.546 Pearson
RWS beta expl eye -0.042 -0.306 0.761 -0.301 0.224 Pearson
RWS beta_wins expl eye -0.029 -0.212 0.833 -0.290 0.236 Pearson
RWS log_beta expl eye 0.060 0.440 0.662 -0.208 0.320 Pearson
RWS rho expl eye 0.055 0.407 0.686 -0.211 0.314 Pearson
WSLS eps expl eye 0.400 3.221 0.002 0.155 0.601 Pearson

Figue S12 Accurcy on Diverging and converging trials

Here we look whether response accuracy tends to be higher when eye/explicit responses converge vs. diverge

summary stats of acc by conv/div & source
exp_name eye_resp_match source mean_acc lo_ci hi_ci
Exp. 1 diff eye 0.291 0.253 0.329
Exp. 1 diff resp 0.709 0.671 0.747
Exp. 1 same eye 0.774 0.749 0.798
Exp. 1 same resp 0.774 0.749 0.798
Exp. 2 diff eye 0.285 0.241 0.329
Exp. 2 diff resp 0.715 0.671 0.759
Exp. 2 same eye 0.778 0.752 0.805
Exp. 2 same resp 0.778 0.752 0.805
Exp.1&2 diff eye 0.287 0.259 0.316
Exp.1&2 diff resp 0.713 0.684 0.741
Exp.1&2 same eye 0.776 0.759 0.794
Exp.1&2 same resp 0.776 0.759 0.794
ttest results of resp acc by conv/diver
statistic p.value parameter ES
-4.7468 0 55 -0.6343

SM figure S13:

diverge % by explore/exploit trials P. Choice

paired ttest of diverge by explor/exploit by pchoice different source
statistic p.value parameter ES p. Choice source
-4.135 0 53 -0.563 Explicit
-16.685 0 55 -2.230 Gaze

SM Fig S14-18

These are additional analyses of Pre-Registartion of Exp. 3

Hypothesis H3a: Confidence decreases following errors:

From the pre-reg “we anticipate that confidence will decrease after trials in which the participant’s response was incorrect as opposed to correct. This will be tested across participants using a paired t-test to compare confidence ratings following previous trials in which the response was correct/incorrect”

smry stats conf by prev. trial acc
mean_diff lo_ci hi_ci t df p.val ES
0.79 0.61 0.98 8.73 31 0 1.54

Fig. S14A

Pre reg EXP 3- H3A: confidence learning curve

“Further supporting the idea that confidence tracks learning we expect confidence to exhibit a learning curve. Within a block (a series of trials that follow the same underlying rule), we expect normalized confidence ratings to exhibit a learning curve that is characterized by a monotonic increase during the early trials until an asymptote is reached”

Fig. S14B

Pre-reg H3b: Confidence is increased for correct trials as opposed to incorrect trials:

In line with extensive work demonstrating human’s metacognitive abilities (Fleming & Daw, 2017; Maniscalco & Lau, 2012), we expect normalized confidence ratings to be increased for trials in which the prediction is in line with the underlying rule (i.e. rule accuracy). This will be tested across participants using a paired t-test to compare confidence on trials with correct/incorrect rule accuracy (see figure “Confidence by Rule Accuracy”).

Figure “Confidence by Rule Accuracy”. Comparing trials by their rule accuracy, we expect that across participants the mean confidence rating will be significantly higher for correct as opposed to incorrect trials.

smry stats conf by trial acc
is_acc accuracy_type variable n mean ci lo_ci hi_ci
0 Actual m_conf 32 0.36 0.15 0.20 0.51
1 Actual m_conf 32 0.53 0.15 0.38 0.68
0 Rule m_conf 32 0.02 0.17 -0.15 0.19
1 Rule m_conf 32 0.61 0.15 0.45 0.76
stats tests conf by trial acc
accuracy_type statistic df p effsize
Actual -4.828 31 0 -0.853
Rule -10.437 31 0 -1.845

Pre reg EXP 3: H3c: Explicit confidence & RW trajectory

“H3C: Confidence tracks the latent variable of previous trial’s Prediction Error (derived from R-W Model): In accordance with Hypothesis 3a, which states that confidence ratings will be updated by the previous trial’s accuracy, and in line with previous research that has found that confidence ratings track internal learning processes (Meyniel et al., 2015), we expect confidence ratings to be significantly correlated with the previous trial’s absolute prediction error. To test this for each participant we will correlate the trial’s Confidence Rating and the previous trial’s Prediction Error. We will then run a one-sample t-test on the participants’ Pearson correlations’ distribution. We hypothesize that the distribution will be significantly greater than zero and moderate . In addition to the previous trial’s prediction error as an exploratory analysis we will examine confidence rating’s correlation to the probability associated with the choice made (P. Choice) and the strength of the value associated with the choice (v; this is quantified as the distance of v from 0.5), that are also derived from the R-W model.”

correlation Confidence Ratings and RWS learning metrics
vars_compared mean_corr hi_CI lo_CI group_t_statistic group_pval ES n_sig total_sub
conf_PC 0.384 0.438 0.330 14.468 0 2.558 30 32
conf_dist_v 0.285 0.362 0.209 7.604 0 1.366 23 31
conf_prev_delta -0.266 -0.195 -0.337 -7.612 0 -1.346 22 32

Fig. S15

Exp 3: Pre_reg Hypothesis 4

####4A Gaze is associated with explicot prediction Hypothesis 4a: Gaze direction is strongly associated with the direction of the explicit prediction: To examine how explicit responses and implicit ocular expectations are related, we will take the normalized gaze direction of the first 300 msec in the environment (see Figure “Association Between Explicit Predictions and Gaze” left panel) and average it per trial and per Prediction (right/ left). We then compare the effect of explicit Prediction on Normalized Gaze across participants using a paired t-test (see Figure “Association Between Explicit Predictions and Gaze” right panel). We expect a significant effect of Prediction on Normalized Gaze.

Fig. S16A

Diff gaze direction by prediction Exp 3 (pre-reg)
mean_diff lo_ci hi_ci t df p.val ES
0.9 0.75 1.04 12.79 31 0 2.26

Fig. S16B

Gaze by Prediction, single ss signfiacne
exp_name n_sig total_n
Exp. 3 30 32

H4: A gaze Prediction is correlated to P. Choice

EXP 3correlation EM and learning metric RW (PreReg sub)
exp_name vars_compared mean_corr lo_corr hi_corr group_t_statistic group_pval ES n_sig total_sub
Exp. 3 PC_pred_gz 0.070 0.104 0.037 4.258 0.000 0.753 NA NA
Exp. 3 da_pred_gz -0.051 -0.023 -0.078 -3.791 0.001 -0.670 NA NA
Exp. 3 distv5_pred_gz NaN NaN NaN 6.262 0.000 1.125 NA NA

Fig. S17

H4:B Gaze fulfills benchmarks of confidence

“Hypothesis 4b:”Gaze exhibits the three statistical hallmarks of confidence”: In brief (see https://osf.io/t49pj for full explanation), we will examine whether gaze reflects confidence and fulfills the three statistical hallmarks of confidence (Sanders et al., 2016)”

Exp 3 Conf benchmark 1 distribution of spearman correlation
exp_name mean_corr lo_corr hi_corr group_t_statistic group_pval ES
Exp. 3 0.1767 0.3595 -0.0062 1.9707 0.0577 0.3484

So this is almost sig. but not a very good way to test it…

Conf benchmark 1 with glmer
effect group term estimate std.error statistic p.value exp_name
fixed NA (Intercept) 1.04042 0.06487 16.03931 0 Exp. 1
fixed NA m_pred_gz 0.26507 0.04654 5.69518 0 Exp. 1
ran_pars sub_name sd__(Intercept) 0.24597 NA NA NA Exp. 1

Fig S18A

Vizulaize benchmark 1

Fig S18B

Benchmark 2: folded X

Running with RWS

EXP 3. RWS lme results Benchmark 2
term npar AIC BIC logLik deviance statistic df p.value exp
bench2_full_exp3.res_rws 6 11933.21 11972.15 -5960.607 11921.21 1.906 1 0.1674 Exp. 3

Fig S18C

Confidence Benchmark 3: EXP3

Exp 3Benchmark 3 RWS (glmer)
effect group term estimate std.error statistic p.value exp_name
fixed NA (Intercept) 0.3266672 0.2311633 1.413144 0.1576133 Exp. 1
fixed NA pred_gz_bin -0.1611013 0.1439355 -1.119261 0.2630290 Exp. 1
fixed NA dist_v_bin 0.1741175 0.0641453 2.714422 0.0066392 Exp. 1
fixed NA pred_gz_bin:dist_v_bin 0.0830579 0.0410870 2.021512 0.0432268 Exp. 1

Fig. S19

Parameter recovery

paramater recovery correlation values
param_name var1 var2 cor statistic p conf.low conf.high method model
alpha sim rec 0.92 74.93 0 0.91 0.93 Pearson RW
beta sim rec 0.59 22.84 0 0.54 0.63 Pearson RW
alpha sim rec 0.73 33.85 0 0.70 0.76 Pearson RWS
beta sim rec 0.57 21.88 0 0.53 0.61 Pearson RWS
rho sim rec 0.71 31.61 0 0.67 0.74 Pearson RWS
eps sim rec 0.97 122.86 0 0.97 0.98 Pearson WSLS